Skip to content

Add Zarr v2 archive format support#286

Closed
egparedes wants to merge 2 commits intoGridTools:masterfrom
EGPlace:claude/serialbox-netcdf-zarr-design-X584N
Closed

Add Zarr v2 archive format support#286
egparedes wants to merge 2 commits intoGridTools:masterfrom
EGPlace:claude/serialbox-netcdf-zarr-design-X584N

Conversation

@egparedes
Copy link

Summary

This PR adds support for the Zarr v2 storage format as a new archive backend in Serialbox. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools.

Key Changes

  • New ZarrArchive class (src/serialbox/core/archive/ZarrArchive.h/cpp):

    • Implements the Archive interface for Zarr v2 format
    • Stores each field as a Zarr array in a subdirectory named <prefix>_<field>.zarr/
    • Supports multiple saves of the same field with the first dimension representing the save index
    • Handles data serialization/deserialization with proper byte-order handling
    • Includes metadata management via JSON files (.zarray for Zarr metadata, ArchiveMetaData-<prefix>.json for Serialbox metadata)
  • Core Features:

    • Supports multiple data types: Boolean, Int32, Int64, Float32, Float64
    • Handles non-contiguous storage views through column-major iteration
    • Implements endianness detection for proper dtype specification
    • Provides both archive-based I/O (with save IDs) and direct file-based I/O (writeToFile/readFromFile)
    • Proper error handling and validation of archive metadata
  • Integration:

    • Updated ArchiveFactory to recognize and create Zarr archives
    • Added comprehensive unit tests covering construction, metadata handling, read/write operations, and various data types/dimensions
    • Updated CMake build configuration to include new source files
  • Directory Layout:

    <directory>/
      ArchiveMetaData-<prefix>.json
      <prefix>_<field>.zarr/
        .zarray                    (Zarr metadata)
        0.0.0...0                  (chunk for save 0)
        1.0.0...0                  (chunk for save 1)
    

Implementation Details

  • Data is stored without compression in native byte order for simplicity and performance
  • Chunk naming follows Zarr v2 specification with indices separated by dots
  • The implementation properly handles arrays with varying dimensions (2D to 7D tested)
  • Supports both row-major and column-major storage layouts through the StorageView abstraction
  • Thread-safety is not currently supported (marked as false in the implementation)

https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz

egparedes and others added 2 commits February 18, 2026 18:08
Implements a native Zarr v2 archive (ZarrArchive) that requires no
external library beyond standard C++17. Each field is stored as a
Zarr array in a subdirectory named `<prefix>_<field>.zarr/` under the
archive directory. Multiple saves of the same field are tracked via a
leading time dimension; individual saves map to separate chunk files
(`<id>.0.0...0`) following the Zarr v2 chunk naming convention.

Changes:
- src/serialbox/core/archive/ZarrArchive.h: new archive class
- src/serialbox/core/archive/ZarrArchive.cpp: full implementation
  * pure C++17, no external dependencies
  * native byte-order dtype strings in .zarray metadata
  * handles both contiguous and strided (padded) StorageViews
  * supports Read / Write / Append open modes
  * writeToFile / readFromFile for stateless single-save I/O
- src/serialbox/core/archive/ArchiveFactory.cpp: register Zarr;
  add .zarr extension mapping in archiveFromExtension
- src/serialbox/core/archive/ArchiveFactory.h: update docstring
- src/serialbox/core/CMakeLists.txt: compile ZarrArchive sources
- test/serialbox/core/archive/UnittestZarrArchive.cpp: unit tests
  mirroring the NetCDF test suite (construction, metadata validation,
  .zarray content, writeToFile/readFromFile, typed read/write round-trips)
- test/serialbox/core/CMakeLists.txt: include new test file

https://claude.ai/code/session_012z6neCsMd8cRDKcYFRaqrz
Copilot AI review requested due to automatic review settings February 18, 2026 22:12
@egparedes egparedes closed this Feb 18, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Zarr v2 storage format to Serialbox as a new archive backend. Zarr is a cloud-friendly, chunked array storage format that enables efficient I/O and interoperability with scientific Python tools. The implementation follows established patterns from existing archive backends (BinaryArchive and NetCDFArchive) and integrates seamlessly with the existing ArchiveFactory infrastructure.

Changes:

  • Implements a new ZarrArchive class that stores fields as Zarr v2 arrays in subdirectories, with support for multiple saves per field
  • Integrates Zarr archive into ArchiveFactory for archive creation and file extension resolution
  • Adds comprehensive unit tests covering construction, metadata handling, read/write operations, and various data types/dimensions

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/serialbox/core/archive/ZarrArchive.h Header defining the ZarrArchive class with Archive interface implementation, static utility methods, and helper functions
src/serialbox/core/archive/ZarrArchive.cpp Implementation of ZarrArchive including endianness detection, data serialization, metadata management, and Zarr v2 format compliance
src/serialbox/core/archive/ArchiveFactory.h Updated documentation to include .zarr extension mapping
src/serialbox/core/archive/ArchiveFactory.cpp Integrated ZarrArchive into factory methods for creation and file I/O
src/serialbox/core/CMakeLists.txt Added ZarrArchive source files to build configuration
test/serialbox/core/CMakeLists.txt Added ZarrArchive unit test to test suite
test/serialbox/core/archive/UnittestZarrArchive.cpp Comprehensive tests for ZarrArchive covering construction, metadata validation, read/write operations, and multiple data types/dimensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const std::size_t numDataDims = activeDims.size();

// Create directory
std::filesystem::create_directories(fieldDir);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_directories call should be wrapped in a try-catch block to handle std::filesystem::filesystem_error exceptions consistently with the constructor (lines 187-189) and the write method (lines 290-294). This ensures that filesystem errors are properly caught and converted to Serialbox Exception types with appropriate error messages.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments